189 research outputs found

    A Primer on Metagenomics

    Get PDF
    Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics

    Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association

    Get PDF
    Protein point mutations are an essential component of the evolutionary and experimental analysis of protein structure and function. While many manually curated databases attempt to index point mutations, most experimentally generated point mutations and the biological impacts of the changes are described in the peer-reviewed published literature. We describe an application, Mutation GraB (Graph Bigram), that identifies, extracts, and verifies point mutations from biomedical literature. The principal problem of point mutation extraction is to link the point mutation with its associated protein and organism of origin. Our algorithm uses a graph-based bigram traversal to identify these relevant associations and exploits the Swiss-Prot protein database to verify this information. The graph bigram method is different from other models for point mutation extraction in that it incorporates frequency and positional data of all terms in an article to drive the point mutation–protein association. Our method was tested on 589 articles describing point mutations from the G protein–coupled receptor (GPCR), tyrosine kinase, and ion channel protein families. We evaluated our graph bigram metric against a word-proximity metric for term association on datasets of full-text literature in these three different protein families. Our testing shows that the graph bigram metric achieves a higher F-measure for the GPCRs (0.79 versus 0.76), protein tyrosine kinases (0.72 versus 0.69), and ion channel transporters (0.76 versus 0.74). Importantly, in situations where more than one protein can be assigned to a point mutation and disambiguation is required, the graph bigram metric achieves a precision of 0.84 compared with the word distance metric precision of 0.73. We believe the graph bigram search metric to be a significant improvement over previous search metrics for point mutation extraction and to be applicable to text-mining application requiring the association of words

    PERIOD–TIMELESS Interval Timer May Require an Additional Feedback Loop

    Get PDF
    In this study we present a detailed, mechanism-based mathematical framework of Drosophila circadian rhythms. This framework facilitates a more systematic approach to understanding circadian rhythms using a comprehensive representation of the network underlying this phenomenon. The possible mechanisms underlying the cytoplasmic “interval timer” created by PERIOD–TIMELESS association are investigated, suggesting a novel positive feedback regulatory structure. Incorporation of this additional feedback into a full circadian model produced results that are consistent with previous experimental observations of wild-type protein profiles and numerous mutant phenotypes

    Stereochemical Criteria for Prediction of the Effects of Proline Mutations on Protein Stability

    Get PDF
    When incorporated into a polypeptide chain, proline (Pro) differs from all other naturally occurring amino acid residues in two important respects. The φ dihedral angle of Pro is constrained to values close to −65° and Pro lacks an amide hydrogen. Consequently, mutations which result in introduction of Pro can significantly affect protein stability. In the present work, we describe a procedure to accurately predict the effect of Pro introduction on protein thermodynamic stability. Seventy-seven of the 97 non-Pro amino acid residues in the model protein, CcdB, were individually mutated to Pro, and the in vivo activity of each mutant was characterized. A decision tree to classify the mutation as perturbing or nonperturbing was created by correlating stereochemical properties of mutants to activity data. The stereochemical properties including main chain dihedral angle φ and main chain amide H-bonds (hydrogen bonds) were determined from 3D models of the mutant proteins built using MODELLER. We assessed the performance of the decision tree on a large dataset of 163 single-site Pro mutations of T4 lysozyme, 74 nsSNPs, and 52 other Pro substitutions from the literature. The overall accuracy of this algorithm was found to be 81% in the case of CcdB, 77% in the case of lysozyme, 76% in the case of nsSNPs, and 71% in the case of other Pro substitution data. The accuracy of Pro scanning mutagenesis for secondary structure assignment was also assessed and found to be at best 69%. Our prediction procedure will be useful in annotating uncharacterized nsSNPs of disease-associated proteins and for protein engineering and design

    Crystal structure of Bacillus anthracis dihydrofolate reductase with the dihydrophthalazine-based trimethoprim derivative RAB1 provides a structural explanation of potency and selectivity

    Get PDF
    Bacillus anthracis possesses an innate resistance to the antibiotic trimethoprim due to poor binding to dihydrofolate reductase (DHFR); currently, there are no commercial antibacterials that target this enzyme in B. anthracis. We have previously reported a series of dihydrophthalazine-based trimethoprim derivatives that are inhibitors for this target. In the present work, we have synthesized one compound (RAB1) displaying favorable 50% inhibitory concentration (54 nM) and MIC (</=12.8 ug/ml) values. RAB1 was cocrystallized with the B. anthracis DHFR in the space group P212121, and X-ray diffraction data were collected to a 2.3-A resolution. Binding of RAB1 causes a conformational change of the side chain of Arg58 and Met37 to accommodate the dihydrophthalazine moiety. Unlike the natural substrate or trimethoprim, the dihydrophthalazine group provides a large hydrophobic anchor that embeds within the DHFR active site and accounts for its selective inhibitory activity against B. anthracis.Peer reviewedVeterinary PathobiologyChemistr

    The G Protein–Coupled Receptor Subset of the Chicken Genome

    Get PDF
    G protein–coupled receptors (GPCRs) are one of the largest families of proteins, and here we scan the recently sequenced chicken genome for GPCRs. We use a homology-based approach, utilizing comparisons with all human GPCRs, to detect and verify chicken GPCRs from translated genomic alignments and Genscan predictions. We present 557 manually curated sequences for GPCRs from the chicken genome, of which 455 were previously not annotated. More than 60% of the chicken Genscan gene predictions with a human ortholog needed curation, which drastically changed the average percentage identity between the human–chicken orthologous pairs (from 56.3% to 72.9%). Of the non-olfactory chicken GPCRs, 79% had a one-to-one orthologous relationship to a human GPCR. The Frizzled, Secretin, and subgroups of the Rhodopsin families have high proportions of orthologous pairs, although the percentage of amino acid identity varies. Other groups show large differences, such as the Adhesion family and GPCRs that bind exogenous ligands. The chicken has only three bitter Taste 2 receptors, and it also lacks an ortholog to human TAS1R2 (one of three GPCRs in the human genome in the Taste 1 receptor family [TAS1R]), implying that the chicken's ability and mode of detecting both bitter and sweet taste may differ from the human's. The chicken genome contains at least 229 olfactory receptors, and the majority of these (218) originate from a chicken-specific expansion. To our knowledge, this dataset of chicken GPCRs is the largest curated dataset from a single gene family from a non-mammalian vertebrate. Both the updated human GPCR dataset, as well the chicken GPCR dataset, are available for download
    corecore